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Abstract 

, Probabilistic Logic Programming (PLP), exemplified by Sato and Kameya's PRISM, 

• ■ Poole's ICL, Raedt et al's ProbLog and Vennekens et al's LPAD, is aimed at combin- 

^ I ing statistical and logical knowledge representation and inference. However, the inference 

techniques used in these works rely on enumerating sets of explanations for a query an- 
swer. Consequently, these languages permit very limited use of random variables with 
I continuous distributions. In this paper, we present a symbolic inference procedure that 

uses constraints and represents sets of explanations without enumeration. This permits us 
to reason over PLPs with Gaussian or Gamma-distributed random variables (in addition 
to discrete- valued random variables) and linear equality constraints over reals. We develop 
■ the inference procedure in the context of PRISM; however the procedure's core ideas can 

be easily applied to other PLP languages as well. An interesting aspect of our inference 
procedure is that PRISM's query evaluation process becomes a special case in the absence 
of any continuous random variables in the program. The symbolic inference procedure 
enables us to reason over complex probabilistic models such as Kalman filters and a large 
subclass of Hybrid Bayesian networks that were hitherto not possible in PLP frameworks. 
(To appear in Theory and Practice of Logic Programming) 



1 Introduction 

Logic Programming (LP) is a well-established language model for knowledge rep- 
resentation based on first-order logic. Probabilistic Logic Programming (PLP) is a 
class of Statistical Relational Learning (SRL) frameworks (jGetoor and Taskar 2007P 
which are designed for combining statistical and logical knowledge representation. 

The semantics of PLP languages is defined based on the semantics of the under- 
lying non-probabilistic logic programs. A large class of PLP languages, including 
ICL (jPoole 2008P . PRISM (|Sato and Kameya 1997[), ProbLog (|Raedt et al. 2007|) 



and LPAD (j Vennekens et al. 2004[) . have a declarative distribution semantics, which 
defines a probability distribution over possible models of the program. Opera- 
tionally, the combined statistical/logical inference is performed based on the proof 
structures analogous to those created by purely logical inference. In particular, in- 
ference proceeds as in traditional LPs except when a random variable's valuation is 
used. Use of a random variable creates a branch in the proof structure, one branch 
for each valuation of the variable. Each proof for an answer is associated with a 
probability based on the random variables used in the proof and their distribu- 
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tions; an answer's probability is determined by the probability that at least one 
proof holds. Since the inference is based on enumerating the proofs/explanations 
for answers, these languages have limited support for continuous random variables. 
We address this problem in this paper. A comparison of our work with recent efforts 
at extending other SRL frameworks to continuous variables appears in Section [2] 

We provide an inference procedure to reason over PLPs with Gaussian or Gamma- 
distributed random variables (in addition to discrete- valued ones), and linear equal- 
ity constraints over values of these continuous random variables. We describe the 
inference procedure based on extending PRISM with continuous random variables. 
This choice is based on the following reasons. First of all, the use of explicit ran- 
dom variables in PRISM simplifies the technical development. Secondly, standard 
statistical models such as Hidden Markov Models (HMMs), Bayesian Networks 
and Probabilistic Context-Free Grammars (PCFGs) can be naturally encoded in 
PRISM. Along the same lines, our extension permits natural encodings of Finite 
Mixture Models (FMMs) and Kalman Filters. Thirdly, PRISM's inference naturally 
reduces to the Viterbi algorithm ( [Forney 1973[ ) over HMMs, and the Insidc-Outsidc 
algorithm ( |Lari and Young 1990[ ) over PCFGs. The combination of well-defined 
model theory and efficient inference has enabled the use of PRISM for synthesizing 
knowledge in sensor networks ( [Singh et al. 2008] ). 

It should be noted that, while the technical development in this paper is limited 
to PRISM, the basic technique itself is applicable to other similar PLP languages 
such as ProbLog and LPAD (see Section [7|). 

Our Contribution: We extend PRISM at the language level to seamlessly include 
discrete as well as continuous random variables. We develop a new inference proce- 
dure to evaluate queries over such extended PRISM programs. 

• We extend the PRISM language for specifying distributions of continuous 
random variables, and linear equality constraints over such variables. 

• We develop a symbolic inference technique to reason with constraints on the 
random variables. PRISM's inference technique becomes a special case of our 
technique when restricted to logic programs with discrete random variables. 

• These two developments enable the encoding of rich statistical models such 
as Kalman Filters and a large class of Hybrid Bayesian Networks; and exact 
inference over such models, which were hitherto not possible in LP and its 
probabilistic extensions. 

Note that the technique of using PRISM for in-network evaluation of queries in 
a sensor network ( [Singh et al. 2008[ ) can now be applied directly when sensor data 
and noise arc continuously distributed. Tracking and navigation problems in sensor 
networks are special cases of the Kalman Filter problem (jChu et al. 2007^ . There 
are a number of other network inference problems, such as the indoor localization 
problem, that have been modeled as FMMs (jGoswami et al. 201 ip . Moreover, our 
extension permits reasoning over models with finite mixture of Gaussians and dis- 
crete distributions (see Section [7|). Our extension of PRISM brings us closer to the 
ideal of finding a declarative basis for programming in the presense of noisy data. 
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The rest of this paper is organized as follows. We begin with a review of related 
work in Section [2l and describe the PRISM framework in detail in Section [3l We 
introduce the extended PRISM language and the symbolic inference technique for 
the extended language in Section [5l In section |6] we show the use of this technique 
on an example encoding of the Kalman Filter. We conclude in Section [7] with a 
discussion on extensions to our inference procedure. 

2 Related Work 

Over the past decade, a number of Statistical Relational Learning (SRL) frame- 
works have been developed, which support modeling, inference and/or learning 
using a combination of logical and statistical methods. These frameworks can 
be broadly classified as statistical-model-based or logic-based, depending on how 
their semantics is defined. In the first category are frameworks such as Bayesian 
Logic Programs (BLPs) ( |Kersting and Raedt 2000D , Probabilistic Relational Mod- 
els (PRMs) (|Friedman et al. 1999]) . and Markov Logic Networks (MLNs) ( [Richardson and Domingos 20061 ), 
where logical relations are used to specify a model compactly. A BLP consists of 
a set of Bayesian clauses (constructed from Bayesian network structure), and a set 
of conditional probabilities (constructed from CPTs of Bayesian network) . PRMs 
encodes discrete Bayesian Networks with Relational Models/Schemas. An MLN is a 
set of formulas in first order logic associated with weights. The semantics of a model 
in these frameworks is given in terms of an underlying statistical model obtained 
by expanding the relations. 

Inference in SRL frameworks such as PRISM ( [Sato and Kameya 1997[ ), 
Stochastic Logic Programs (SLP) ( [Muggleton 1996[ ), Independent Choice Logic 
(ICL) (jPoole 2008p . and ProbLog (IRaedt et al. 2007^ is primarily driven by query 
evaluation over logic programs. In SLP, clauses of a logic program are annotated 
with probabilities, which arc then used to associate probabilities with proofs (deriva- 
tions in a logic program). ICL (jPoole 1993P consists of definite clauses and disjoint 
declarations of the form disjoint{[hi : pi,...,/i„ : p„]) that specifies a probabil- 
ity distribution over the hypotheses (i.e., {hi, .., /i„}). Any probabilistic knowledge 
representable in a discrete Bayesian network can be represented in this framework. 
While the language model itself is restricted (e.g., ICL permits only acyclic clauses), 
it had declarative distribution semantics. This semantic foundation was later used in 
other frameworks such as PRISM and ProbLog. CP-Logic (jVennekens et al. 2009P is 
a logical language to represent probabilistic causal laws, and its semantics is equiva- 
lent to probability distribution over well-founded models of certain logic programs. 
Specifications in LPAD (jVennekens et al. 2004| resemble those in CP-Logic: prob- 
abilistic predicates are specified with disjunctive clauses, i.e. clauses with multiple 
disjunctive consequents, with a distribution defined over the consequents. LPAD has 
a distribution semantics, and a proof-based operational semantics similar to that 
of PRISM. ProbLog specifications annotate facts in a logic program with probabil- 
ities. In contrast to SLP. ProbLog has a distribution semantics and a proof-based 
operational semantics. PRISM (discussed in detail in the next section), LPAD and 
ProbLog are equally expressive. PRISM uses explicit random variables and a simple 
inference but restricted procedure. In particular, PRISM demands that the set of 
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proofs for an answer are pairwise mutually exclusive, and that the set of random 
variables used in a single proof are pairwise independent. The inference procedures 
of LPAD and ProbLog lift these restrictions. 

SRL frameworks that are based primarily on statistical inference, such as 
BLP, PRM and MLN, were originally defined over discrete-valued random vari- 
ables, and have been naturally extended to support a combination of discrete 
and continuous variables. Continuous BLP ( [Kersting and Raedt 2001[ ) and Hybrid 
PRM (jNarman et al. 2010p extend their base models by using Hybrid Bayesian Net- 
works ( [Murphy 1998[ ) . Hybrid MLN ( [Wang and Domingos 2008] ) allows description 
of continuous properties and attributes (e.g., the formula length(x) = 5 with weight 
w) deriving MRFs with continuous- valued nodes (e.g., length(a) for a grounding of 
X, with mean 5 and standard deviation l/\/2w). 

In contrast to BLP, PRM and MLN, SRL frameworks that are primarily based 
on logical inference offer limited support for continuous variables. In fact, among 
such frameworks, only ProbLog has been recently extended with continuous vari- 
ables. Hybrid ProbLog (|Gutmann et al. 2010p extends Problog by adding a set of 
continuous probabilistic facts (e.g., {Xi,(j)i) fi, where Xi is a variable appearing 
in atom fi, and denotes its Gaussian density function). It adds three predicates 
namely below, above, ininterval to the background knowledge to process values of 
continuous facts. A ProbLog program may use a continuous random variable, but 
further processing can be based only on testing whether or not the variable's value 
lies in a given interval. As a consequence, statistical models such as Finite Mix- 
ture Models can be encoded in Hybrid ProbLog, but others such as certain classes 
of Hybrid Bayesian Networks (with continuous child with continuous parents) and 
Kalman Filters cannot be encoded. The extension to PRISM described in this paper 
makes the framework general enough to encode such statistical models. 

More recently, (jGutmann et al. 201ip introduced a sampling based approach for 
(approximate) probabilistic inference in a ProbLog-like language that combines 
continuous and discrete random variables. The inference algorithm uses forward 
chaining and rejection sampling. The language permits a large class of models 
where discrete and continuous variables may be combined without restriction. In 
contrast, we propose an exact inference algorithm with a more restrictive language, 
but ensure that inference matches the complexity of specialized inference algorithms 
for important classes of statistical models (e.g., Kalman filters). 

3 Background: an overview of PRISM 

PRISM programs have Prolog- like syntax (see Fig. [T]). In a PRISM program the 
msw relation ( "multi- valued switch") has a special meaning: msw(X,I,V) says that 
V is the outcome of the I-th instance from a family X of random processeqll- The 
set of variables {V^ | msw(p, i, V^)} are i.i.d. for a given random process p. The 
distribution parameters of the random variables are specified separately. 

The program in Fig. [1] encodes a Hidden Markov Model (HMM) in PRISM. 

^ Following PRISM, we often omit the instance number in an msw when a program uses only one 
instance from a family of random processes. 
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The set of observations is encoded as 
facts of predicate obs, where obs(I,V) 
means that value V was observed at time 
I. In the figure, the clause defining hmm 
says that T is the N-th state if we traverse 
the HMM starting at an initial state S 
(itself the outcome of the random pro- 
cess init). In hmm_part (I , N, S, T) , 
S is the I-th state, T is the N-th state. 
The first clause of hmm_part defines the 
conditions under which we go from the Fig. 1: PRISM program for an HMM 
I-th state S to the I+l-th state NextS. 

Random processes trans (S) and emit(S) give the distributions of transitions and 
emissions, respectively, from state S. 

The meaning of a PRISM program is given in terms of a distribution se- 



hjimi(N, T) :- 

msw(init , S) , 
himn.part (0 , N, S, T) . 

himii_part(I, N, S, T) :- 
I < N, Nextl is I+l, 
msw(traiis(S) , I, NextS), 
obs (Nextl, A), 
mswCemit (NextS) , Nextl, A), 
hiirai_part(NextI, N, NextS, T) . 

hiirai.partd, N, S, T) :- I=N, S=T. 



mantics (Sato and Kameya 1997 Sato and Kameya 19991. A PRISM program is 



treated as a non-probabilistic logic program over a set of probabilistic facts, the 
msw relation. An instance of the msw relation defines one choice of values of all ran- 
dom variables. A PRISM program is associated with a set of least models, one for 
each msw relation instance. A probability distribution is then defined over the set 
of models, based on the probability distribution of the msw relation instances. This 
distribution is the semantics of a PRISM program. Note that the distribution se- 
mantics is declarative. For a subclass of programs, PRISM has an efficient procedure 
for computing this semantics based on OLDT resolution (jTamaki and Sato 1986p . 

Inference in PRISM proceeds as follows. When the goal selected at a step is of 
the form msw(X, I ,Y) , then Y is bound to a possible outcome of a random process X. 
Thus in PRISM, derivations are constructed by enumerating the possible outcomes 
of each random variable. The derivation step is associated with the probability of 
this outcome. If all random processes encountered in a derivation are independent, 
then the probability of the derivation is the product of probabilities of each step 
in the derivation. If a set of derivations are pairwise mutually exclusive, the prob- 
ability of the set is the sum of probabilities of each derivation in the set. PRISM's 
evaluation procedure is defined only when the independence and exclusiveness as- 
sumptions hold. Finally, the probability of an answer is the probability of the set 
of derivations of that answer. 



4 Extended PRISM 

Support for continuous variables is added by modifying PRISM's language in two 
ways. We use the msw relation to sample from discrete as well as continuous distri- 
butions. In PRISM, a special relation called values is used to specify the ranges 
of values of random variables; the probability mass functions are specified using 
set_sw directives. In our extension, we extend the set_sw directives to specify prob- 
ability density functions as well. For instance, set_sw(r, norm(Mu, Var) ) specifies 
that outcomes of random processes r have Gaussian distribution with mean Mu and 
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variance Vaijj. Parameterized families of random processes may be specified, as long 
as the parameters are discrete- valued. For instance, set_sw(w(M) , norm(Mu, Var) ) 
specifies a family of random processes, with one for each value of M. As in PRISM, 
set_sw directives may be specified programmatically; for instance, in the specifica- 
tion of w(M), the distribution parameters may be computed as functions of M. 

Additionally, we extend PRISM programs with linear equality constraints over 
reals. Without loss of generality, we assume that constraints are written as linear 
equalities of the form y = oi * Xi + . . . + an* Xn + b where and b are all floating- 
point constants. The use of constraints enables us to encode Hybrid Bayesian 
Networks and Kalman Filters as extended PRISM programs. In the following, we 
use Constr to denote a set (conjunction) of linear equality constraints. We also 
denote by X a vector of variables and/or values, explicitly specifying the size only 
when it is not clear from the context. This permits us to write linear equality 
constraints compactly (e.g., Y = a- X + b). 

Encoding of Kalman Filter specifications uses linear constraints and closely fol- 
lows the structure of the HMM specification, and is shown in Section [51 

Distribution Semantics: We extend PRISM's distribution semantics for continuous 
random variables as follows. The idea is to construct a probability space for the 
msw definitions (called probabilistic facts in PRISM) and then extend it to a prob- 
ability space for the entire program using least model semantics. Sample space for 
the probabilistic facts is constructed from those of discrete and continuous random 
variables. The sample space of a continuous random variable is the set of real num- 
bers, 3ff. The sample space of a set of random variables is a Cartesian product of 
the sample spaces of individual variables. We complete the definition of a proba- 
bility space for continuous random variables by considering the Borel a-algebra 
over 3?^, and defining a Lebesgue measure on this set as the probability measure. 
Lifting the probability space to cover the entire program needs one significant step. 
We use the least model semantics of constraint logic programs (jJaffar et al. 1998)) 
as the basis for defining the semantics of extended PRISM programs. A point in 
the sample space is an arbitrary interpretation of the program, with its Herbrand 
universe and 3? as the domain of interpretation. For each sample, we distinguish 
between the interpretation of user-defined predicates and probabilistic facts. Note 
that only the probabilistic facts have probabilistic behavior in PRISM; the rest of a 
model is defined in terms of logical consequence. Hence, we can define a probability 
measure over a set of sample points by using the measure defined for the proba- 
bilistic facts alone. The semantics of an extended PRISM program is thus defined 
as a distribution over its possible models. 

5 Inference 

Recall that PRISM's inference explicitly enumerates outcomes of random variables 
in derivations. The key to inference in the presence of continuous random vari- 

^ The technical development in this paper considers only univariate Gaussian variables; see Dis- 
cussions section on a discussion on how multivariate Gaussian as well as other continuous 
distributions are handled. 
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ables is avoiding enumeration by representing the derivations and their attributes 
symboUcaUy. A single step in the construction of a symbohc derivation is defined 
below. 

Definition 1 [Symbolic Derivation) 

A goal G directly derives goal G', denoted G ^ G', if: 

PCR: G = qi{Xi), Gi, and there exists a clause in the program, 

qi{Y) : -^1(^1)^2(1^2), • ■ • ,^(^m), such that 9 = mgu{qi{Xi),qi{Y)); then, 
G' = (ri(Yr), r2(TU • • ■ , '^m(^), Gi)0; 

MSW: G = msw(rv(X), y), Gi: then G' = Gi; 

CONS: G = Constr,Gi and Constr is satisfiablc: then G' = Gi. 

A symbolic derivation of G is a sequence of goals Go, Gi, . . . such that G = Go and, 
for ah i > 0, Gi ^ G,+i. 

We only consider successful derivations, i.e., the last step of a derivation resolves to 
an empty clause. Note that the traditional notion of derivation in a logic program 
coincides with that of symbolic derivation when the selected subgoal (literal) is not 
an msw or a constraint. When the selected subgoal is an msw, PRISM's inference 
will construct the next step by enumerating the values of the random variable. 
In contrast, symbolic derivation skips msw's and constraints and continues with 
the remaining subgoals in a goal. The effect of these constructs is computed by 
associating (a) variable type information and (b) a success function (defined below) 
with each goal in the derivation. The symbolic derivation for the goal widget (X) 
over the program in Example [T] is shown in Fig. I2bl 



widget (X) :- msw(m, M) , 

msw(st(M), Z) , 
mswCpt , Y) , 

X = Y + Z. Gi : widget(X) 

'L Ranges of RVs 0^2 : msw{m, M),msw{st{M), Z),m.sw{pt,Y), X = Y + Z. 

values (m, [a,b]). | 

values(st(M) , real). G3 -. mswistiM). Z),ynsw{pt,Y), X = Y + Z. 

values (pt, real). | 

Ga : msw(pt.Y),X = Y + Z. 

\ 

: X = Y + Z. 
\ 



1 PDFs and PMFs: 

:- set_sw(m, [0.3, 0.7]) , 

set_sw(st(a) , norm(2.0, 1.0)) 
set_sw(st(b) , normO.O, 1.0)) 
set_sw(pt, norm(0.5, 0.1)). 



(a) Mixture model program (b) Symbolic derivation for goal widget (X) 



Fig. 2: Finite Mixture Model Program and Symbolic Derivation 



Example 1 

Consider a factory with two machines a and b. Each machine produces a widget 
structure and then the structure is painted with a color. In the program Fig. [23 
mswCm, M) chooses either machine a or b, msw(st(M) , Z) gives the cost Z of a 
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product structure, msw(pt, Y) gives the cost Y of painting, and finally X = Y + Z 
returns the price of a painted widget X. □ 



q(Y) :- msw(rv, X), 
p(X, Y). 

p(a, Y) :- r(Y). 

p(b, Y) :- s(Y). i 

msw(rv,X),p(X,Y) 

r(2). \ 

s(2). pi^^y) 

s(3). Jf=' ^=k^ 

r(Y) s(Y) 

values(rv, [a,b]). y^=i 1-=^ y,=-2 1^=^ 

:- set_sw(rv, [0.3, 0.7]). o o o o 

(a) Example program (b) Symbolic derivation for goal q(Y) 



Fig. 3: Symbolic derivation 

Example 2 

This example illustrates how symbolic derivation differs from traditional logic pro- 
gramming derivation. Fig. l3bl shows the symbolic derivation for goal q{Y) in Fig. lBal 
Notice that the symbolic derivation still makes branches in the derivation tree 
for various logic definitions and outcomes. But the main difference with traditional 
logic derivation is that it skips msw and Constr definitions, and continues with the 
remaining subgoals in a goal. □ 

Success Functions: Goals in a symbolic derivation may contain variables whose 
values are determined by msw's appearing subsequently in the derivation. With 
each goal Gi in a symbolic derivation, we associate a set of variables, V{Gi), that 
is a subset of variables in Gi. The set V{Gi) is such that the variables in V{Gi) 
subsequently appear as parameters or outcomes of msw's in some subsequent goal 
Gj, j > i. We can further partition V into two disjoint sets, Vc and Vd, representing 
continuous and discrete variables, respectively. The sets Vc and Vd are called the 
derivation variables of G,;, defined below. 

Definition 2 {Derivation Variables) 

Let G ^ G' such that G" is derived from G using: 

PCR: Let 9 be the mgu in this step. Then Vc{G) and Vd{G) are the largest sets of 
variables in G such that Vc{G)e C K(G') and Vd{G)e C Vd(G'). 

MSW: Let G = msw(rv(X), F), G'. Then Vc(G) and Vd{G) arc the largest sets of 
variables in G such that Vc{G) C K(G') U {F}, and Vd{G) C Vd{G') U X if F is 
continuous, otherwise Vc{G) C K(G'), and Vd{G) C ^^(G') U X U {F}. 

CONS: Let G = Constr, G' . Then Vc(G) and Vd{G) are the largest sets of variables 
in G such that K(G) C K(G') U vars{Constr), and Vd(G) C Vd{G'). 
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Given a goal Gi in a symbolic derivation, we can associate with it a success 
function, which is a function from the set of all valuations of V{Gi) to [0, 1]. Intu- 
itively, the success function represents the probability that the symbolic derivation 
represents a successful derivation for each valuation of V{Gi). 

Representation of success functions: Given a set of variables V, let C denote the 
set of all linear equality constraints over reals using V. Let L be the set of all linear 
functions over V with real coefficients. Let A/j(:(/i, cr^) be the PDF of a univariate 
Gaussian distribution with mean fj, and variance cr^, and Sx{X) be the Dirac delta 
function which is zero everywhere except at x and integration of the delta function 
over its entire range is 1. Expressions of the form fc * J^^ 5y{Vi) Yii-^fa where k is 
a non- negative real number and /; G L, are called product PDF (PPDF) functions 
over V. We use (possibly subscripted) to denote such functions. A pair (0, C) 
where C C C is called a constrained PPDF function. A sum of a finite number of 
constrained PPDF functions is called a success function, represented as '^j^{4>i, Ci). 

We use Ci{ip) to denote the constraints (i.e., Gi) in the i*'* constrained PPDF 
function of success function ip; and Di{ip) to denote the i*^ PPDF function of ip. 

Success functions of base predicates: The success function of a constraint C is (1,C). 
The success function of true is (1, true). The PPDF component of msw(rv(X), F)'s 
success function is the probability density function of rv's distribution if rv is con- 
tinuous, and its probability mass function if rv is discrete; its constraint component 
is true. 

Example 3 

The success function -01 of msw(m,M) for the program in Example[l]is such that ipi = 
0.3(5a(M) -|-0.75b(M). Note that we can represent the success function using tables, 
where each tabic row denotes discrete random variable valuations. For example, the 
above success function can be represented as Fig. |4^. Thus instead of using delta 
functions, we often omit it in examples and represent success functions using tables. 

Fig. IIId represents the success function of msw(st(M) , Z) for the program in 
Example [TJ Similarly, the success function ij:^ of msw(pt, Y) for the program in 
ExamplelDis 0^3 = A^y (0.5, 0.1). 

Finally, the success function ^04 of X = Y + Z for the program in Example [1] is 
i^i^ {1,X = Y + Z). □ 

Success functions of user-defined predicates: If 
G — > G" is a step in a derivation, then the 
success function of G is computed bottom-up 
based on the success function of G' . This com- 
putation is done using join and marginalize op- 
erations on success functions. 

Definition 3 {Join) pig. 4: Success Functions 

Let 01 = Y^i{Di,Ci) and 02 = ^j{Dj,Cj) be two success functions, then join of 
01 and 02 represented as 0i * ■02 is the success function J2i j{DiDj, Ci A Cj). 













M 


01 




M 


02 






a 


0.3 




a 


Nz{2.0, 1.0) 






b 


0.7 




b 


Arz(3.0, 1.0) 




(a,) mswCm.M) 


(b 


mswCstCM), Z) 
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Example 4 

Let Fig. [5a| and [5b] represent the success functions '4'msw(m.,M){M) and 
tpG3 {X, Y, Z, M) respectively. 



M 




a 


0.3 


b 


0.7 



M 




a 


(A/'z(2.0, 1.0).AAy(0.5,0.1),X = y + Z) 


b 


(A/'z(3.0, 1.0).A4'(0.5, 0.1),X = y + Z) 



(a) 



(b) 



M 



i}G2 {X, Y, Z, M) = tP^sMm,M) {M) * (X, Y, Z, M) 



(0.3A/'z(2.0, 1.0).A^i'(0.5,0.1),X = F + Z) 



(0.7A/'z(3.0, 1.0).A/'y(0.5,0.1),X = y + Z) 



Fig. 5: Join of Success Functions 

Then Fig. [5c] shows the join of ipmsw(m,M) [M) and tAg., {X, Y, Z, M) . □ 

Note that we perform a simphfication of success functions after the join operation. 
We eUminate any PPDF term in ij) which is inconsistent w.r.t. delta functions. For 
example, 5a{M)5b{M) = as M can not be both a and h at the same time. 

Given a success function i}} for a goal G, the success function for 3X. G is com- 
puted by the marginalization operation. Marginalization w.r.t. a discrete variable 
is straightforward and omitted. Below we define marginalization w.r.t. continuous 
variables in two steps: first rewriting the success function in a projected form and 
then doing the required integration. 

The goal of projection is to eliminate any linear constraint on V , where V is the 
continuous variable to marginalize over. The projection operation involves finding 
a linear constraint (i.e., V = a ■ X + b) onV and replacing all occurrences of V in 
the success function hy a ■ X + b. 

Definition 4 (Projection) 

Projection of a success function ip w.r.t. a continuous variable V, denoted by ip ],v, 
is a success function tp' such that 

Vz. ACV'O = D,{^)[a-X + b/V]; and C,(V/) = (QiiP) - Cip)[a -X + b/V], 
where Cip is a linear constraint {V = a-X + b) on V in Ci{ip) and t[s/a;] denotes 
replacement of all occurrences of x in t by s. 

Note that the replacement of by a • X + 5 in PDFs and linear constraints does 
not alter the general form of a success function. Thus projection returns a success 
function. Notice that if ijj does not contain any linear constraint on V, then the 
projected form remains the same. 

Example 5 

Let = (0.3A/'z(2.0, 1.0).7VV(0.5,0.1),X = y 
Then projection of ipi w.r.t. Y yields 

iPi iY= 0.3AAz(2.0, 1.0).A6f-z(0.5, 0.1). 

Notice that Y is replaced by X — Z. 



Z) represent a success function. 

(1) 
□ 
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Proposition 1 

Integration of a PPDF function with respect to a variable F is a PPDF function, 
i.e., 



/OO ni III, 



a 



'-°°k=i 1=1 
where V elC^ and V ^^1- 

For example, 

^falV-Xl (mi, cri).7Va2y-X2 (M2, crl)dV 

= A^aaXi-aiXa (aiM2 - 02^1, ajaj + ofaj). (2) 

Here Xi,X2 are linear combinations of variables (except V). A proof of the propo- 
sition is presented in Section [8l 

Definition 5 (Integration) 

Let be a success function that docs not contain any linear constraints on V. Then 
integration of -;/' with respect to denoted by ip is a success function ip' such 
that yi.D.i^') J D,{iP)dV. 

It is easy to see (using Proposition [Ij that the integral of success functions are 
also success functions. Note that if ip does not contain any PDF on V, then the 
integrated form remains the same. 

Example 6 

Let ?/'2 ~ 0.3A/z(2.0, 1.0).A/'x^z(0.5, 0.1) represent a success function. Then inte- 
gration of "02 w.r.t. Z yields 

i^2 = j 0.3AA2(2.0,1.0).AGf_z(0.5,0.1)(iZ 

= 0.37Vx(2.5,l.l). (using Equation[2]) 

Definition 6 {Marginalize) 

Marginalization of a success function tp with respect to a variable V , denoted by 
M(-0, F), is a success function ip' such that 

ip' = <j> ^ iv 

We overload M to denote marginalization over a set of variables, defined such 
that M(^/',{F}UX) = M(M(V',F),X) and M(V',{}) = -p. 

Proposition 2 

The set of all success functions is closed under join and marginalize operations. 
The success function for a derivation is defined as follows. 
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Definition 7 {Success function of a goal) 

The success function of a goal G, denoted by -00, is computed based on the deriva- 
tion G^G': 

{ M(V'G' , V{G') - V{G)) for all program clause resolution G ^ G' 
= < i^mswirv(x),Y) * V'G' if G = msw(rv(X), Y),G' 

{ i'Constr * ipG' if G = Constr, G' 

Note that the above definition carries PRISM's assumption that an instance of a 
random variable occurs at most once in any derivation. In particular, the PGR step 
marginalizes success functions w.r.t. a set of variables; the valuations of the set of 
variables must be mutually exclusive for correctness of this step. The MSW step 
joins success functions; the goals joined must use independent random variables for 
the join operation to correctly compute success functions in this step. 

Example 7 

Fig. [2b] shows the symbolic derivation for the goal widget (X) over the mixture 
model program in Example [T] The success function of goal G5 is ijjQ^ {X, Y, Z) — 

(i,x = y + z). 

V'G4 {X, Y, Z) = (F) * {X, r, Z) = (AAy (0.5, 0.1), X = y + z) . 

The success function of goal G3 is V'msu,(st(M),z) {Z) * i^d {X, Y, Z) (Fig. [5b|. 
Then join of ipmsw{7n,M){M) and 'ipG3{X, Y, Z, M) yields the success function in 
Fig. [5c] (see Example [4]). 

Finally, (X) = M(Vg. (X, Y, Z, Af ), {M, F, Z}). 
First we marginalize '(/'G2 {X, Y, Z, M) w.r.t. M : 

^'g^ =M(Vjg.,A/) = / Vg. Ui 

JM 

= (0.3AAz(2.0,1.0).AAy(0.5,0.1),X = y + Z) 
+ (0.7AA2(3.0, 1.0).AAy(0.5, 0.1), X = Y + Z). 

Next we marginalize the above success function w.r.t. Y: 

^ 0.3A/'z(2.0, 1.0).A6f-z(0.5, 0.1) + 0.7A/'z(3.0, 1.0).A6f-z(0.5, 0.1). 
Finally, we marginalize the above function over variable Z to get tpQ-^^^X): 

V-Gi {X) ^ M(7/.^^, Z) = £ '0^^ ;z= 0.3A6f (2.5, 1.1) + 0.7A6f (3.5, 1.1). 
Example 8 

In this example, we compute success function of goal qiY) in Example [2] Fig. [3bl 
shows the symbohc derivation for goal qiY). Success function of r{Y) is 5iiY) + 
(52 (y), and success function of s{Y) is 52{y) + ^3(^)- Similarly, success function of 

p{X,Y) is 5a{X){5^{Y) + 52{Y)) + 5b{X){52{Y)+53{Y)). Now 
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Success function of msw{rv, X) is 0.3Sa{X) + 0.7Sb{X). Join of 'ipmsw(rv,x) and 
i^pix,Y) yields 0.35a{X){6i{Y) + S2iY)) + 0.7Sb{X){S2iY) + 53{Y)). Finally, V,(y) = 
0.3((5i(y) + S2iY)) + 0.7((52(y) + SsiY)). 

When Y = 1, only p(a, 1) is true. Thus V'g(i) = On the other hand, ipq(2) =1-0 
as both p{a, 2) and p{b, 2) are true when Y=2. Similarly, V'<;(3) = 0.7. □ 

Complexity: Let S'i denote the number of constrained PPDF terms in ipf, Pi denote 
the maximum number of product terms in any PPDF function in tpf, and Qi denote 
the maximum size of a constraint set (Ci) in ■i/'i- The time complexity of the two 
basic operations used in constructing a symbolic derivation is as follows. 

Proposition 3 ( Time Complexity) 

The worst-case time complexity of Join{^i, 4>j) is 0{Si * Sj * {Pi * Pj + Qi* Qj))- 
The worst-case time complexity of M{ipg,V) is 0{Sg * Pg) when V is discrete and 
0{Sg * {Pg + Qg)) when V is continuous. 

Note that when computing the success function of a goal in a derivation, the 
join operation is limited to joining the success function of a single msw or a single 
constraint set to the success function of a goal, and hence the parameters Si, Pi, 
and Qi are typically small. The complexity of the size of success functions is as 
follows. 

Proposition 4 {Success Function Size) 

For a goal G and its symbolic derivation, the following hold: 

1. The maximum number of product terms in any PPDF function in ipc 
is linear in |yc(G)|, the number of continuous variables in G. 

2. The maximum size of a constraint set in a constrained PPDF function 
in -00 is linear in 1 14(6)1. 

3. The maximum number of constrained PPDF functions in any entry of 
TpG is potentially exponential in the number of discrete random variables 
in the symbolic derivation. 

The number of product terms and the size of constraint sets are hence indepen- 
dent of the length of the symbolic derivation. Note that for a program with only 
discrete random variables, there may be exponentially fewer symbolic derivations 
than concrete derivations. The compactness is only in terms of number of deriva- 
tions and not the total size of the representations. In fact, for programs with only 
discrete random variables, there is a one-to-one correspondence between the en- 
tries in the tabular representation of success functions and PRISM's answer tables. 
For such programs, it is easy to show that the time complexity of the inference 
algorithm presented in this paper is same as that of PRISM. 

Correctness of the Inference Algorithm: The technically complex aspect of correct- 
ness is the closure of the set of success functions w.r.t. join and marginalize oper- 
ations. Proposition [1] and [2] state these closure properties. Definition [7] represents 
the inference algorithm for computing the success function of a goal. The distribu- 
tion of a goal is formally defined in terms of the distribution semantics of extended 
PRISM programs and is computed using the inference algorithm. 
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Theorem 5 

The success function of a goal computed by the inference algorithm represents the 
distribution of the answer to that goal. 

Proof: Correctness w.r.t. distribution semantics follows from the definition of join 
and marginalize operations, and PRISM's independence and exclusiveness assump- 
tions. We prove this by induction on derivation length n. For n = 1, the definition 
of success function for base predicates gives a correct distribution. 

Now let's assume that for a derivation of length n, our inference algorithm com- 
putes valid distribution. Let's assume that G' has a derivation of length n and 
G ^ G' . Thus G has a derivation of length n + 1. We show that the success func- 
tion of G represents a valid distribution. 

We compute ■i/'G using Definition [7] and it carries PRISM's assumption that an in- 
stance of a random variable occurs at most once in any derivation. More specifically, 
the PGR step marginalizes ipc' w.r.t. a set of variables V{G') — V{G). Since ac- 
cording to PRISM's exclusiveness assumption the valuations of the set of variables 
arc mutually exclusive, the marginalization operation returns a valid distribution. 
Analogously, the MSW/CONS step joins success functions, and the goals joined use 
independent random variables (following PRISM's assumption) for the join opera- 
tion to correctly compute ipQ in this step. Thus represents a valid distribution. 

6 Illustrative Example 

In this section, we model Kalman 
filters ( [Russell and Norvig 2003[ ) using 
logic programs. The model describes a 
random walk of a single continuous state 
variable St with noisy observation Vf. 
The initial state distribution is assumed 
to be Gaussian with mean and vari- 
ance ctq. The transition and sensor mod- 
els are Gaussian noises with zero means 
and constant variances CTg, al respec- 
tively. 

Fig. [S] shows a logic program for 
Kalman filter, and Fig. [7] shows the 
derivation for a query kf{l,T). Note the 
similarity between this and hmm program 
(Fig. [T|): only trans/emit definitions are 
different. We label the i*'* derivation step 
by Gi which is used in the next sub- 
section to refer to appropriate derivation 
step. Here, our goal is to compute filtered (fiiSrfeiMaPdPWflPSfpr Kalman Filter 

Success Function Computation: Fig. [7] shows the bottom-up success function com- 
putation. Note that ^012 is same as Vds except that obs{l, V) binds V to an 



kf(N, T) :- 
msw(init , S) , 
kf_part(0, N, S, T) . 

kf_part(I, N, S, T) :- 
I < N, Nextl is I+l, 
trans (S, I, NextS) , 
emit(NextS, Nextl, V), 
obs (Nextl, V), 

kf_part (Nextl, N, NextS, T) . 

kf_part(I, N, S, T) :- 
I=N, T=S. 

trans (S, I, NextS) :- 
msw(trans_err , I, E) , 
NextS = S + E. 

emit (NextS, I , V) :- 
msw(obs_err, I, X), 
V = NextS + X. 
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Gl : kf(l. T) 

I 



G2 : 



v(init, S), kf_part{0, 1, S, T) 



I 



^3 : kf_part(0, 1, S, T) 



< 1, Ncxtl is + 1, trana(S, NcxtS), 
G4 : Gmit(NcxtS, V), obs(NcxtI, V), 

kf_part(NcxtI, 1, NcxtS, T) 



Ncxtl is 0+1, transfS, NcxtS), 
G5 : emit(NextS, V), obs(NextI, V), 
kf_part(NextI, 1, NextS, T) 



Q . trans(S, NcxtS), cmit(NcxtS, V), 
' obs(l, V). kf_part(l, 1, NcxtS, T) 



msw(H^i:ans_ei:r, E), NextS = S + E, 
G7 : cmit(NextS, V), obs(l, V), 

kf_part(l, 1, NextS, T) 



G8 



NcxtS = S + E, cmit(NoxtS, V), 
obs(l, V), kf_part(l, 1, NcxtS, T) 



Q . cmit(NcxtS, V), oba(l, V), 
kf_pai-t{l, 1, NcxtS, T) 



Q . msw(obs_ei:r, X), V = NcxtS + X, 
obB(l, V), kf_part(l, 1, NcxtS, T) 



Q . V = NcxtS + X, obs(l, V), 
kf_part(l, 1, NextS, T) 



G12 • obs(l, V), kf_part(l. 1, NcxtS, T) 

I 

^13 ■ kf_part(l, 1, NextS, T) 

\ 

Gi4 : 1 = 1 T = NextS 

I 

^15 : T = NcxtS 

i 



*G2 = ^r^sv,{inif) * '/'G3 

^Gi = <AA„j_„exts(0."S)-^JVe^tS-S(C<'s).^ = « = ="*S> 



'/'Gs = <Ar^i-JVextS(0.<';)-^NextS-S(0.<'s)-T' = NextS) 

= (A"i,i-JVextS(0. ■'?)-^JVcxfS-S(0. "l) • ^ = NextS) 

^G-r = ^m3U](tra,i3.CTr) * ^Gg 

T = NextS A NextS = S + E) 

^Gg = ^NextS = S + E • '/'Gg 

= <A^i,j - NcxtS (0- ^ = NextS A NextS = S + E) 

'I'Gg ="(*Gio'^) 

= {AA„j_jvcxfS(0- T = NextS) 

^Gio = ^^sn,(abs er-r-) * *Gii 

= {AfxiO, cr^), T = NextS A VI = NextS + X) 

^Gii = 'f'vi=NextS + X * ^Gi2 

= (1, T = NextS A VI = NextS + X) 

i'Gi2 = (1' T = JVextS) 

i/'Gi3 = (1, T = NextS) 

^Gi4 = (1> ^ = NextS) 

V'Gits = (1'^ = NextS) 



Fig. 7: Symbolic derivation and success functions for kf (1,T) 



observation vi. Final step involves marginalization w.r.t. 5, 

= A4i_t(0, (Tl).J\fT{p^o, o-Q + ^D- (using EquationlJ) 
= A/tC^'i, o'^)-A/'t(mo, ctq + ^s)- (constant shifting) 

(product of two Gaussian PDFs is another PDF) 

which is the filtered distribution of state T after seeing one observation, which is 
equal to the filtered distribution presented in ( [Russell and Norvig 2003D . 
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7 Discussion and Concluding Remarks 



ProbLog and PITA (Riguzzi and Swift 2010), an implementation of LPAD, lift 



prism's mutual exclusion and independence restrictions by using a BDD-based 
representation of explanations. The technical development in this paper is based 
on PRISM and imposes PRISM's restrictions. However, we can remove these re- 
strictions by using the following approach. In the first step, we materialize the set 
of symbolic derivations. In the second step, we can factor the derivations into a 
form analogous to BDDs such that random variables each path of the factored rep- 
resentation are independent, and distinct paths in the representation are mutually 
exclusive. For instance, consider two non-exclusive branches in a symbolic deriva- 
tion tree, one of which has msw(r, X) and the other that has insw(s,Y). This will 
be factored such that one of the two, say msw(r ,X' ) is done in common, with two 
branches: X = X' and X ^ X' . The branch containing subgoal msw(s,Y) is "and- 
ed" with the X = X' branch, and replicated as the X ^ X' branch, analogous to 
how BDDs are processed. The factored representation itself can be treated as sym- 
bolic derivations augmented with dis-equality constraints (i.e. of the form X ^ e). 
Note that the success function of an equality constraint C is (1,C). The success 
function of a dis-equality constraint X ^ e is (1, true) — (1, X = e), which is rep- 
resentablc by extending our language of success functions to permit non-negative 
constants. The definitions of join and marginalize operations work with no change 
over the extended success functions, and the closure properties (Prop. [2]) holds as 
well. Hence, success functions can be readily computed over the factored represen- 
tation. A detailed discussion of this extension appears in Pslam 2012^ . 

Note that the success function of a goal represents the likelihood of a successful 
derivation for each instance of a goal. Hence the probability measure computed by 
the success function is what PRISM calls inside probability. Analogously, we can 
define a function that represents the likelihood that a goal G' will be encountered 
in a symbolic derivation starting at goal G. This "call" function will represent the 
outside probability of PRISM. Alternatively, we can use the Magic Sets transfor- 
mation (jBancilhon et al. 1986[) to compute call functions of a program in terms of 
success functions of a transformed program. The ability to compute inside and out- 
side probabilities can be used to infer smoothed distributions for temporal models. 

For simplicity, in this paper we focused only on univariate Gaussians. However, 
the techniques can be easily extended to support multivariate Gaussian distribu- 
tions, by extending the integration function (Dcfn. [S|), and set_sw directives. We 
can also readily extend them to support Gamma distributions. More generally, the 
PDF functions can be generalized to contain Gaussian or Gamma density func- 
tions, such that variables are not shared between Gaussian and Gamma density 
functions. Again, the only change is to extend the integration function to handle 
PDFs of Gamma distribution. 

The concept of symbolic derivations and success functions can be applied to 
parameter learning as well. We have developed an EM-bascd learning algorithm 
which permits us to learn the distribution parameters of extended PRISM programs 
with discrete as well as Gaussian random variables ([Islam et al. 2012p . Similar to 



Inference in Probabilistic Logic Programs with Continuous Random Variables!? 



inference, our learning algoritlini uses the symbolic derivation procedure to compute 
Expected Sufficient Statistics (ESS). The E-step of the learning algorithm involves 
computation of the ESSs of the random variables and the M-step computes the MLE 
of the distribution parameters given the ESS and success probabilities. Analogous 
to the inference algorithm presented in this paper, our learning algorithm specializes 
to prism's learning over programs without any continuous variables. For mixture 
model, the learning algorithm does the same computation as standard EM learning 
algorithm ( [Bishop 2006| . 



The symbolic inference and learning procedures enable us to reason over a large 
class of statistical models such as hybrid Bayesian networks with discrete child- 
discrete parent, continuous child-discrete parent (finite mixture model), and con- 
tinuous child-continuous parent (Kalman filter), which was hitherto not possible in 
PLP frameworks. It can also be used for hybrid models, e.g., models that mix dis- 
crete and Gaussian distributions. For instance, consider the mixture model example 
where st(a) is Gaussian but st(b) is a discrete distribution with values 1 and 2 
with 0.5 probability each. The density of the mixture distribution can be written 
as f{Z) = 0.37Vz(2.0, 1.0) + 0.355i.o{Z) + 0.35S2.o{Z). Thus the language can be 
used to model problems that lie outside traditional hybrid Bayesian networks. 

We implemented the extended inference algorithm presented in this paper in 
the XSB logic programming system (| Swift et al. 2012"]) . The system is available at 



http : //www . cs . sunysb . edu/-cram/contdist This proof-of-conccpt prototype is 



implemented as a meta-interpreter and currently supports discrete and Gaussian 
distributions. The meaning of various probabilistic predicates (e.g., msw, values, 
set_sw) in the system are similar to that of PRISM system. This implementation 
illustrates how the inference algorithm specializes to the specialized techniques that 
have been developed for several popular statistical models such as HMM, FMM, Hy- 
brid Bayesian Networks and Kalman Filters. Integration of the inference algorithm 
in XSB and its performance evaluation are topics of future work. 

Acknowledgments. We thank the reviewers for valuable comments. This research 
was supported in part by NSF Grants CCF-1018459, CCF-0831298, and ONR Grant 
N00014-07-1-0928. 

8 Appendix 

This section presents proof of Proposition [TJ 

Property 6 

Integrated form of a PPDF function with respect to a variable y is a PPDF function, 
i.e., 



/oo ™ rn' 
fc=i 1=1 



where V € Xk and V ^ X[. 
(Proof) 

The above proposition states that integrated form of a product of Gaussian PDF 
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functions with respect to a variable is a product of Gaussian PDF functions. We 
first prove it for a simple case involving two standard Gaussian PDF functions, and 
then generalize it for arbitrary number of Gaussians. 

For simplicity, let us first compute the integrated-form of 
A/V-Xi(0, 1).A/V-Js:2(0i 1) w.r.t. variable V where Xi,X2 are linear combina- 
tion of variables (except V). We make the following two assumptions: 

1. The coefficient of is 1 in both PDFs. 

2. Both PDFs arc standard normal distributions (i.e., fi — and <t^ = 1). 
Let (f) denote the integrated form, i.e., 

Mv-xAO,l)-J^v-x,iO,l)dV 

1 (V-Xif 1 {V-X2)^ 

: exp 2 _ exp 2 dY 



12-K V27r 



OO 



- cxD-^[(^--^i''+(^-^^^'l dV 

oo 1 

— exp^ dV 
27r 



Now 



r,^{V-X,f + {V-X2f 
^ 2.V^ - 2.V.{Xi + X2) + {Xl + Xl) 



2 

where 



Thus the mtegrated form can be expressed as 
00 1 

J_cxp-i2[(v-^4^)^+9] 

r ±exp-i2 (^-^^)\exp-^2.«dl- 
1 



^exp ^ / — exp ^-2 (iy 

— -=exp~^ (as integration over the whole area is 1) 

2v7r 

— - — exp" 3(^1"-^")' 



V2^ 

= A6f,-x.(0,2) 
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Thus integrated form of a PPDF function is another PPDF function. Notice that 
the integrated form is a constant when XI = X2. 

Generalization for arbitrary number of PDFs. Note that for any arbitrary number 
of PDFs in a PPDF function, rj = ^ Xif' can be always written as k\{V — 

+ where 

n ^ — ' ^ — ' 

i=l i=l 

For any arbitrary number of PDFs, we wih prove the property on gn. In other 
words, we will show that gn can be expressed as 

9n-'-j: XI - ^(E X.f ^ E - ^.)^ (3) 
which means integrated form of n PDFs, 

Y[Afv-xAO,'^)dV 

1=1 

can be expressed as 

0„ = a exp"^'" = a ■N'x.-x, 

Proposition 7 

Let /„ = X,. Then, fl = Er=i ^1 + E.^, X,X,. 

Proof 

We prove the proposition using induction. Let us assume that the above equation 
holds for n variables. Now for (n + 1)*'' variable X„+i, 

n+1 

/,?+i = (E^^)' 

n 

= ((E^O+^n+l)' 

n 

= (E X,f + X^+i + 2(Xi + ... + X„)X„+i 

i=l 

n n 

= E^»'+ xa3 + xi+^ + 2{Xi + ... + Xn)Xn+i 

■1=1 i^j,i=l 

(using induction hypothesis) 

n+1 

i=l i^j 

□ 
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Now going back to proving equation [3l we first show that g„ can be written in 
the following form 

_j n n n 

g„ = i ^ - = i_ [(, _ 1) ^ Xf - Y: Xa,] 

n ^ — ' -"^ — ' 71^ -"^ — ' ^ — ' 

2—1 i—1 i—1 i^j 



The above equation can be proved by induction. It is easy to see that for n = 2 the 

1 y2 1 /Y^2 Y \2 _ 1 r 



equatfon holds, as g2 = 5 ELi - = j[X? + ^| - ^1^2 - ^2^1 



Now 

n+l 

(n+l) 

n+l 



=1 

n+l n+l 



^ i=l 1=1 i^i^j 

(using Proposition [7]) 

n+l 

-7;^tT)^[«E^'-Em.] 

^ 1=1 

Thus g^ = ^ [{n ~ 1) Er=i " E.^, ^.^jl- 
Finally, we will prove that 



1 " 1 

5" = ;^[(--i)E^'-Em.] = ;^ E (^^-^.)' 

1=1 i^j 

Proposition 8 

Let /i„ = E.^,,,<,(^. - Xjf. Then = (n - 1) ElLi - E.^, 

We use induction to prove the above proposition. Let hn holds for n variables. Then 
for (n + 1)*'' variable, 

n 

= hn + ''^^{Xi — Xn+l)"^ 
i=l 

n n n n 

= (n-l)^X?- J2 ^.^j+E^'' + "^"+i-2EMn+i 

n+l n+l 

= "E^'" E ^'^j- 

z— 1 i^j^i—1 

□ 
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Thus g„ = ^/i„ = ^ J2i=ij,i<ji^t ^ ^jY- Thus 0„ can be expressed as 

i¥=j,i<j 

Integrated-form with arbitrary constants: For any arbitrary mean, variance and co- 
efficients of V, 

/oo 
J^aiV-^Xi (Pl , CT? ) ■J^a2 V-X2 ifJ-2 ,(^l)dV 
-00 

A/'a2Xi-aiX2(aiM2 - 02^1,02'^? + ajcrl) 



k=\ "A; lli^fc,i = l '^l 

Note that the normalization constant is also adjusted appropriately in the inte- 
grated form. 
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